Iterative Decoding of Low-Density Parity Check Codes H 

(An Introductory Survey) 

Venkatesan Guruswamit 

Department of Computer Science and Engineering 
University of Washington 
Seattle, WA 98195 

September 2006 



Abstract 

Much progress has been made on decoding algorithms for error-correcting codes in the last 
decade. In this article, we give an introduction to some fundamental results on iterative, 
message-passing algorithms for low-density parity check codes. For certain important stochastic 
channels, this line of work has enabled getting very close to Shannon capacity with algorithms 
that are extremely efficient (both in theory and practice). 
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1 Introduction 



Over the past decade or so, there has been substantial new progress on algorithmic aspects of coding 
theory. A (far from exhaustive) list of the themes that have witnessed intense research activity 
includes: 

1. A resurgence of interest in the long forgotten class of low-density parity check (LDPC) codes 
and on iterative, message-passing decoding algorithms for them, which has resulted in codes 
with rates extremely close to Shannon capacity together with efficient decoding algorithms. 

2. Linear time encodable/decodable error-correcting codes (based on expanders) for worst-case 
errors. 

3. List decoding algorithms which correct many more worst-case errors beyond the "half-the- 
code-distance" bound, and which can achieve capacity even against adversarial noise. 1 

Of course there are some interrelations between the above directions; in particular, progress on 
linear-time encodable/decodable codes is based on expander codes, which are LDPC codes with 
additional properties. Also, list decoding algorithms that run in linear time and correct a fraction 
p of errors for any desired p < 1 have been developed using expander-based ideas |12| . 

Of the above lines of work, the last two have a broader following in the theoretical computer 
science community, due to their focus on the combinatorial, worst-case noise model and the ex- 
traneous applications of such codes in contexts besides communication (such as pseudorandomness 
and average-case complexity). The sister complexity theory column that appears in SIGACT news 
featured recent surveys on both these topics [HIES]- A longer survey on very recent developments 
in list decoding of algebraic codes will appear in ^0]- A very brief survey featuring couple of 
complexity-theoretic uses of list decoding appears in 11 . Applications of coding theory to com- 
plexity theory, especially those revolving around sub-linear algorithms, are surveyed in detail in 

We use the opportunity provided by this column to focus on the first line of work on iterative 
(also called message-passing or belief propagation) algorithms for decoding LDPC codes. This is in 
itself a vast area with numerous technically sophisticated results. For a comprehensive discussion of 
this area, we point the reader to the upcoming book by Richardson and Urbanke |25j . which is an 
excellent resource on this topic. The February 2001 issue of Volume 47 of the IEEE Transactions on 
Information Theory is another valuable resource — this was a special issue dedicated to iterative 
decoding and in particular contains the series of papers ^Sl El E31 [22] . This sequence of papers 
is arguably one of the most important post-Gallager developments in the analysis of iterative 
decoding, and it laid down the foundations for much of the recent progress in this field. 

Disclaimer: The literature on the subject of LDPC and related codes and belief propagation 
algorithms is vast and diverse, and the author, not having worked on the topic himself, is only 
aware of a small portion of it. Our aim will be to merely provide a peek into some of the basic 
context, results, and methods of the area. We will focus almost exclusively on LDPC codes, and 
important related constructions such as LT codes, Raptor codes, Repeat-Accumulate codes, and 

lr The capacity-achieving part was recently shown for codes over large alphabets, specifically explicit codes of rate 
close to 1 — p that can be list decoded in polynomial time from a fraction p of errors were constructed in |14|. For 
binary codes, the capacity for decoding a fraction p of errors equals 1 — H(p), but we do not know how to achieve 
this constructively. 
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turbo codes are either skipped or only very briefly mentioned. While the article should (hopefully) 
be devoid of major technical inaccuracies, we apologize for any inappropriate omissions in credits 
and citations (and welcome comments from the reader if any such major omissions are spotted). 

Organization: We begin with some basic background information concerning LDPC codes, the 
channel models we will study, and the goal of this line of study in Section |2J In Section we 
discuss how concatenated codes with an outer code that can correct a small fraction of errors can 
be used to approach capacity, albeit with a poor dependence on the gap to capacity. We then turn 
to message passing algorithms for LDPC codes and describe their high level structure in Sectional 
With this in place, we develop and analyze some specific message passing algorithms for regular 
LDPC codes in Section [51 establishing theoretical thresholds for the binary erasure and binary 
symmetric channels. We then turn our focus to irregular LDPC codes in Section H3 and discuss, 
among other things, how one can use them to achieve the capacity of the binary erasure channel. 
Finally, in Sectional we discuss how one can achieve linear encoding time for LDPC codes, and also 
discuss a variant called Irregular Repeat-Accumulate (IRA) codes that are linear-time encodable 
by design and additionally offer improved complexity-vs-performance trade-offs. 

2 Background 

2.1 Linear and LDPC codes 

We will focus exclusively on binary linear codes. A binary linear code C of block length n is a 
subspace of FJ> where F2 = {0, 1} is the field with two elements. The rate of C, denoted R(C), 
equals k/n where k is the dimension of C (as a vector space over F2); such a code is also referred to 
as an [n, k] code. Being a linear subspace of dimension k, the code C can be described as the kernel 
of a matrix H € F^ 1- * *", so that C = {c 6 F£ | He = 0} (we treat codewords c as column vectors 
for this description). The matrix H is called the parity check matrix of the code C. In general, any 
choice of H whose rows form a basis of the dual space C 1 - = {x G F2 | x l c = OVc G C} describes 
the same code. Of special interest to us here are codes that admit a sparse parity check matrix. 
In particular, we will study low-density parity check (LDPC) codes, which were introduced and 
studied in Gallager's amazing work |Bj that was way ahead of its time. LDPC codes are described 
by a parity check matrix all of whose rows and columns have at most a fixed constant number of 
l's (the constant is independent of n). 2 

A convenient way to describe an LDPC code is in terms of its factor graph? This is a natural 
bipartite graph defined as follows. On the left side are n vertices, called variable nodes, one for 
each codeword position. On the right are m = n — k vertices, called check nodes, one for each 
parity check (row of the parity check matrix). A check node is adjacent to all variable nodes whose 
corresponding codeword symbols appear in this parity check. In other words, the parity check 
matrix of the code is precisely the bipartite adjacency matrix of the factor graph. 

A special class of LDPC codes are regular LDPC codes where the factor graph is both left-regular 
and right-regular. Regular LDPC codes were in fact the variant originally studied by Gallager 
as well as in the works of Mackay and Neal |18| and Sipser and Spielman |29| that sparked 

2 We will throughout be interested in a family of codes of increasing block length n with rate k/n held a fixed 
constant. For convenience, we don't spell this out explicitly, but this asymptotic focus should always be kept in mind. 

3 This graphical representation applies for any linear code. But the resulting graph will be sparse, and hence 
amenable to linear time algorithms, only for LDPC codes. 
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the resurgence of interest in LDPC codes after over 30 years since Gallager's work. 4 LDPC codes 
based on non-regular graphs, called irregular LDPC codes, rose to prominence beginning in the 
work of Luby et al ^3 El (studying codes based on irregular graphs was one of the big conceptual 
leaps made in these works). We will return to this aspect later in the survey. A popular choice of 
regular LDPC codes (with a rate of 1/2) are (3, 6)-regular LDPC codes where variable nodes have 
degree 3 and check nodes have degree 6. 

2.2 Channel models and their capacity 

Design of good LDPC codes, together with progress in analyzing natural message-passing algo- 
rithms for decoding them, has led to rapid progress towards approaching the capacity of important 
stochastic channels. We now review the main noise models that we will be interested in. 

Throughout, we deal with binary codes only. We will find it convenient to use {+1, —1} (instead 
of {0, 1}) for the binary alphabet, where +1 corresponds to the bit and —1 to the bit 1. Note the 
XOR operation becomes multiplication in the ±1 notation. 

We will assume the channel's operation to be memoryless, so that each symbol of the codeword 
is distorted independently according to the same channel law. So to specify the noise model, it 
suffices to specify how the noise distorts a single input symbol. For us the input symbol will always 
be either ±1, and so the channels have as input alphabet X = {1, —1}. Their output alphabet will 
be denoted by y and will be different for the different channels. Upon transmission of a codeword 
c G X n , the word y observed by the receiver belongs to y n . The receiver must then decode y and 
hopefully compute the original transmitted codeword c. The challenge is to achieve a vanishingly 
small error probability (i.e., the probability of either a decoding failure or an incorrect decoding), 
while at the same time operating at a good rate, hopefully close to the capacity of the channel. 

We begin with the simplest noise model, the Binary Erasure Channel (BEC). This is parame- 
terized by a real number a, < a < 1. The output alphabet is y = {1, —1, ?}, with ? signifying 
an erasure. Upon input x £ X, the channel outputs x with probability 1 — a, and outputs ? with 
probability a. The value a is called the erasure probability, and we denote by BEC a the BEC with 
erasure probability a. For large n, the received word consists of about (1 — a)n unerased symbols 
with high probability, so the maximum rate at which reliable communication is possible is at most 
(1 — a) (this holds even if the transmitter and receiver knew in advance which bits will be erased). 
It turns out this upper bound can be achieved, and Elias [SJ, who first introduced the BEC, also 
proved that its capacity equals (1 — a). 

The Binary Symmetric Channel (BSC) is parameterized by a real number p, < p < 1/2, and 
has output alphabet y = {1,-1}. On input x G X, the channel outputs bx where b = — 1 with 
probability p and 6 = 1 with probability 1 — p. The value p is called the crossover probability. The 
BSC with crossover probability p is denoted by BSC P . The capacity of BSC P is well known to be 
1 — H(p), where H(p) = —plgp — (1 — p) lg(l — p) is the binary entropy function. 

Finally, we mention a channel with continuous output alphabet y called Binary Input Additive 
White Gaussian Noise (BIAWGN). Here y equals the set of real numbers, and the channel operation 
is modeled as y = x + z where x G {±1} is the input and z is a normal variable with mean and 

4 In the long interim period, LDPC codes went into oblivion, with the exception of two (known to us) works. Zyablov 
and Pinsker |35| proved that for random LDPC codes, with high probability over the choice of the code, Gallager's 
algorithm corrected a constant fraction of worst-case errors. Tanner |33| presented an important generalization of 
Gallager's construction and his decoding algorithms, which was later important in the work on linear time decodable 
expander codes |29|. 
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variance a 2 (i.e., has probability density function p{z) = -j== e We denote by BIAWGNo- the 

BIAWGN with variance a 2 ; its capacity is a function of 1/V 2 alone, though there is no elementary 
form expression known for the capacity (but it can be expressed as an integral that can be estimated 
numerically). For rate 1/2, the largest a (Shannon limit) for which reliable communication on the 
BIAWGN channel is possible is (up to the precision given) <7 op t = 0.9787. 

More generally, if we allow scaling of inputs, the capacity is a function of the "signal-to-noise" 
ratio En I (J 2 where is the energy expended per channel use. If the inputs to the channel are 
not constrained to be ±1, but instead can take arbitrary real values, then it is well known that 
the capacity of the AWGN channel equals \ log 2 (l + E^/a 2 ) bits per channel use. In particular, 
in order to achieve reliable communication at a rate of 1/2 over the real-input AWGN channel, 
a signal-to-noise ratio of 1, or dB, is required. 5 For the BIAWGN channel, this ratio increases 
to l/cr 2 t = 1.044 or 0.187 dB. Accordingly, the yardstick to measure the quality of a decoding 
algorithm for an LDPC code of rate 1/2 is how close to this limit it can lead to correct decoding 
with probability tending to 1 (over the realization of the BIAWGN channel noise). 

The continuous output of a BIAWGN channel can be quantized to yield a discrete approxima- 
tion to the original value, which can then be used in decoding. (Of course, this leads to loss in 
information, but is often done for considerations of decoding complexity.) A particularly simple 
quantization is to decode a signal x into 1 if x > and into — 1 if x < 0. This effectively converts an 
AWGN channel with variance a 2 into a BSC with crossover probability Q(l/o~) = ^== f™ a e~ x l 2 dx. 
It should not come as a surprise that the capacity of the resulting BSC falls well short of the capacity 
of the BIAWGN. 

All the above channels have the following output- symmetry property: For each possible channel 
output q, p(y = q\x = 1) = p(y = —q\x = —1). (Here p(y\x) denotes the conditional probability 
that the channel output equals y given the channel input is x.) 

We will focus a good deal of attention on the BEC. Being a very simple channel, it serves as a 
good warm-up to develop the central ideas, and at the same time achieving capacity on the BEC 
with iterative decoding of LDPC codes is technically non-trivial. The ideas which were originally 
developed for erasure codes in |16j have been generalized for more general channels, including the 
BSC and BIAWGN, with great success [I3ESIl22| ■ Yet, to date the BEC is the only channel known 
for which one can provably get arbitrarily close to capacity via iterative decoding of (an ensemble 
of) LDPC codes. So naturally, given our focus on the theoretical aspects, the BEC is of particular 
interest. 

2.3 Spirit of the results 

The central goal of research in channel coding is the following: given a particular channel, find a 
family of codes which have fast (ideally linear-time) encoding algorithms and which can be reliably 
decoded in linear time at rates arbitrarily close to channel capacity. This is, of course, also the goal 
of the line of work on LDPC codes. 

In "practice" one of the things that seems to get people excited are plots of the signal-to-noise 
ratio (SNR) vs bit error probability (BER) for finite-length codes found by non-trivial optimization 
based on theoretical insights, followed by simulation on, say, the BIAWGN channel. Inspired by the 
remarkable success on the BEC |16| . this approach was pioneered for LDPC codes in the presence 

5 In decibel notation, A > is equivalent to 10 log 10 A dB. 
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of errors in j 3 11 117j , culminating in the demonstration of codes for the BIAWGN channel in [22] 
that beat turbo codes and get very close to the Shannon limit. 

Since this article is intended for a theory audience, our focus will be on the "worst" channel pa- 
rameter (which we call threshold) for which one can prove that the decoding will be successful with 
probability approaching 1 in the asymptotic limit as the block length grows to infinity. The rele- 
vant channel parameters for the BEC, BSC, and BIAWGN are, respectively, the erasure probability, 
crossover probability, and the variance of the Gaussian noise. The threshold is like the random 
capacity for a given code (or ensemble of codes) and a particular decoder. Normally for studying 
capacity we fix the channel and ask what is the largest rate under which reliable communication 
is possible, whereas here we fix the rate and ask for the worst channel under which probability of 
miscommunication tends to zero. Of course, the goal is to attain as a large a threshold as possible, 
ideally approaching the Shannon limit (for example, 1 — a for BEC a and 1 — H(p) for BSC P ). 

3 Simple concatenated schemes to achieve capacity on BEC and 
BSC 

We could consider the channel coding problem solved (at least in theory) on a given channel if we 
have explicit codes, with efficient algorithms for encoding and reliable decoding at rates within any 
desired e of capacity. Ideally, the run time of the algorithms should be linear in the block length 
n, and also depend polynomially on 1/e. (But as we will see later, for certain channels like the 
BEC, we can have a runtime of 0(n log(l/e)), or even better cn with c independent of e, if we 
allow randomization in the construction.) In this section, we discuss some "simple" attacks on this 
problem for the BEC and BSC, why they are not satisfactory, and the basic challenges this raises 
(some of which are addressed by the line of work on LDPC codes). 

For the BEC, once we have the description of the generator matrix of a linear code that achieves 
capacity, we can decode in 0(n 3 ) time by solving a linear system (the decoding succeeds if the system 
has a unique solution). Since a random linear code achieves capacity with high probability jS], we 
can sample a random generator matrix, thus getting a code that works with high probability 
(together with a cubic time algorithm). However, we do not know any method to certify that the 
chosen code indeed achieves capacity. The drawbacks with this solution are the cubic time and 
randomized nature of the construction. 

A construction using concatenated codes gets around both these shortcomings. The idea origi- 
nates in Forney's work [7j that was the first to present codes approaching capacity with polynomial 
time encoding and decoding algorithms. 

Let a be the erasure probability of the BEC and say our goal is to construct a code of rate (1 — 
a — e) that enables reliable communication on BEC Q . Let C\ be a linear time encodable/decodable 
binary code of rate (1 — e/2) that can correct a small constant fraction 7 = 7(e) > of worst-case 
erasures. Such codes were constructed in jHUip. For the concatenated coding, we do the following. 
For some parameter b, we block the codeword of C\ into blocks of size b, and then encode each of 
these blocks by a suitable inner binary linear code C2 of dimension b and rate (1 — a — e/2). The 
inner code will be picked so that it achieves the capacity of the BEC a , and specifically recovers the 
correct message with success probability at least 1 — 7/2. For b = b(s, 7) = Q ^ lo g(V7) ^ ^ a ran d m 
code meets this goal with high probability, so we can find one by brute-force search (that takes 
constant time depending only on e). 
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The decoding proceeds as one would expect: first each of the inner blocks is decoded, by solving 
a linear system, returning either decoding failure or the correct value of the block. (There are no 
errors, so when successful, the decoder knows it is correct.) Since the inner blocks are chosen to be 
large enough, each inner decoding fails with probability at most 7/2. Since the noise on different 
blocks are independent, by a Chernoff bound, except with exponentially small probability, we have 
at most a fraction 7 of erasures in the outer codeword. These are then handled by the linear-time 
erasure decoder for C\. 

We conclude that, for the BEC a , we can construct codes of rate 1 — a — e, i.e., within e of 
capacity, that can be encoded and decoded in n/e ^ time. While this is pretty good, the brute- 
force search for the inner code is unsatisfying, and the BEC is simple enough that better runtimes 
(such as 0{n log(l/e))) are achieved by certain irregular LDPC codes. 

A similar approach can be used for the BSC P . The outer code C\ must be picked so that it can 
correct a small fraction of worst-case errors — again, such codes of rate close to 1 with linear time 
encoding and decoding are known |3L)l I13j . Everything works as above, except that the decoding 
of the inner codes, where we find the codeword of C2 closest to the received block, requires a 
brute-force search and this takes 2 b = 2 f ^ 1 / £ ) time. This can be improved to polynomial in 1/e 
by building a look-up table, but then the size of the look-up table, and hence the space complexity 
and time for precomputing the table, is exponential in 1/e. 

In summary, for the BSC p , we can construct codes of rate 1 — H(p) —e, i.e., within e of capacity, 
that can be encoded in n/e°^ time and which can be reliably decoded in n2 1 / £ ° 1J time. It 
remains an important open question to obtain such a result with decoding complexity n/e°^ l \ or 
even poly(n/e). 6 

We also want to point out that recently an alternate method using LP decoding has been used 
to obtain polynomial time decoding at rates arbitrarily close to capacity 6 i . But this also suffers 
from a similar poor dependence on the gap e to capacity. 

4 Message-passing iterative decoding: An abstract view 
4.1 Basic Structure 

We now discuss the general structure of natural message-passing iterative decoding algorithms, as 
discussed, for example, in j2H]- In these algorithms, messages are exchanged between the variable 
and check nodes in discrete time steps. Initially, each variable node Vj, 1 < j < n, has an associated 
received value rj, which is a random variable taking values in the channel output alphabet y. Based 
on this, each variable sends a message belong to some message alphabet M. A common choice 
for this initial message is simply the received value rj, or perhaps some quantized version of rj 
for continuous output channels such as BIAWGN. Now, each check node c processes the messages 
it receives from its neighbors, and sends back a suitable message in Ai to each of its neighboring 
variable nodes. Upon receipt of the messages from the check nodes, each variable node Vj uses these 
together with its own received value rj to produce new messages that are sent to its neighboring 
check nodes. This process continues for many time steps, till a certain cap on the number of 

6 We remark that asymptotically, with e fixed and n — > 00, the exponential dependence on 1/e can be absorbed into 
an additional factor with a slowly growing dependence on n. However, since in practice one is interested in moderate 
block length codes, say n < 10 6 , a target runtime such as 0(n/e) seems like a clean way to pose the underlying 
theoretical question. 
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iterations is reached. In the analysis, we are interested in the probability of incorrect decoding, 
such as the bit-error probability. For every time step i, i € N, the i'ih iteration consists of a round 
check-to-variable node messages, followed by the variable nodes responding with their messages to 
the check nodes. The O'th iteration consists of dummy messages from the check nodes, followed by 
the variable nodes sending their received values to the check nodes. 

A very important condition in the determination of the next message based on the messages 
received from the neighbors is that message sent by u along an edge e does not depend on the 
message just received along edge e. This is important so that only "extrinsic" information is passed 
along from a node to its neighbor in each step. It is exactly this restriction that leads to the 
independence condition that makes analysis of the decoding possible. 

In light of the above restriction, the iterative decoding can be described in terms of the following 
message maps: : y x .A/^ -1 — > Ai for variable node v with degree d v for the £'th iteration, 
t > 1, and \?c^ : M dv ~ l — ► M for check node c with degree d c . Note the message maps can be 
different for different iterations, though several powerful choices exist where they remain the same 
for all iterations (and we will mostly discuss such decoders). Also, while the message maps can be 
different for different variable (and check) nodes, we will use the same map (except for the obvious 
dependence on the degree, in case of irregular graphs). 

The intuitive interpretation of messages is the following. A message is supposed to be an 
estimate or guess of a particular codeword bit. For messages that take ±1 values, the guess on 
the bit is simply the message itself. We can also add a third value, say 0, that would signify an 
erasure or abstention from guessing the value of the bit. More generally, messages can take values 
in a larger discrete domain, or even take continuous values. In these cases the sign of the message 
is the estimated value of the codeword bit, and its absolute value is a measure of the reliability or 
confidence in the estimated bit value. 



4.2 Symmetry Assumptions 

We have already discussed the output-symmetry condition of the channels we will be interested in, 
i.e., p(y = q\x = 1) = p{y = —q\x = —1). We now mention two reasonable symmetry assumptions 
on the message maps, which will be satisfied by the message maps underlying the decoders we 
discuss: 

• Check node symmetry: Signs factor out of check node message maps, i.e., for all 
(6i,...,6 dc _ 1 )6{l,-l}« fe - 1 

/<fc-l \ 

• Variable node symmetry: If the signs of all messages into a variable node are flipped, 
then the sign of its output gets flipped: 

f^(-mo,-mi,'" ,-m4_i) = -ff (m ,mi,--' ,™d c -i) • 



When the above symmetry assumptions are fulfilled and the channel is output-symmetric, the 
decoding error probability is independent of the actual codeword transmitted. Indeed, it is not 
hard (see, for instance |23l Lemma 1]) to show that when a codeword [x\, . . . ,x n ) is transmitted 



9 



and (yi, . . . , y n ) is received where yi = XiZi, the messages to and from the variable node v\ are equal 
to Xi times the corresponding message when the all-ones codeword is transmitted and (z\ , . . . , z n ) 
is received. Therefore, the entire behavior of the decoder can be predicted from its behavior 
assuming transmission of the all-ones codeword (recall that we are using {1,-1} notation for the 
binary alphabet). So, for the analysis, we will assume that the all-ones codeword was transmitted. 

5 Regular LDPC codes and simple iterative decoders 

We will begin with regular LDPC codes and a theoretical analysis of simple message-passing algo- 
rithms for decoding them. 

5.1 Gallager's program 

The story of LDPC codes and iterative decoding begins in Gallager's remarkable Ph.D. thesis 
completed in 1960, and later published in 1963 8 . Gallager analyzed the behavior of a code 
picked randomly from the ensemble of (d v , d c )-regular LDPC codes of a large block length. He 
proved that with high probability, as d v and d c increase, the rate vs. minimum distance trade-off 
of the code approaches the Gilbert- Varshamov bound. Gallager also analyzed the error probability 
of maximum likelihood (ML) decoding of random (d c , (i c )-regular LDPC codes, and showed that 
LDPC codes are at least as good on the BSC as the optimum code a somewhat higher rate (refer 
to jS] for formal details concerning this statement). This demonstrated the promise of LDPC codes 
independently of their decoding algorithms (since ML decoding is the optimal decoding algorithm 
in terms of minimizing error probability). 

To complement this statement, Gallager also proved a "negative" result showing that for each 
finite d c , there is a finite gap to capacity on the BSC when using regular LDPC codes with check 
node degrees d c More precisely, he proved that the largest rate that can be achieved for BSC P with 

error probability going to zero is at most 1 — where pd c = 1+ ( 1 ~ 2p ) — . This claim holds even 

for irregular LDPC codes with d c interpreted as the maximum check node degree. This shows that 
the maximum check node degree needs to grow with the gap e between the rate of the code and 
capacity of the BSC. 

Since only exponential time solutions to the ML decoding problem are known, Gallager also 
developed simple, iterative decoding algorithms for LDPC codes. These form the precursor to 
the modern day message-passing algorithms. More generally, he laid down the foundations of the 
following program for determining the threshold channel parameter below which a suitable LDPC 
code can be used in conjunction with a given iterative decoder for reliable information transmission. 

Code construction: Construct a family of (d v , <i c )-regular factor graphs with n variable nodes 
(for increasing n) with girth greater than At(n) = fi(logn). An explicit construction of such 
graphs was also given by Gallager J, Appendix C]. 

Analysis of Decoder: Determine the average fraction of incorrect 7 messages passed at the z'th 
iteration of decoding for i < t = t(n) (assuming there are no cycles of length at most At). 
This fraction is usually expressed by a system of recursive equations that depend on d v , d c 
and the channel parameter (such as crossover probability, in case of the BSC). 

7 A message is incorrect if the bit value it estimates is wrong. For transmission of the all-ones codeword, this 
means the message has a non-positive value. 
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Threshold computation: Using the above equations, compute (analytically or numerically) the 
threshold channel parameter below which the expected fraction of incorrect messages ap- 
proaches zero as the number of iterations increases. Conclude that the chosen decoder when 
applied to this family of codes with E(n) decoding rounds leads to bit-error probability ap- 
proaching zero as long as the channel parameter is below the threshold. 

The recent research on (irregular) LDPC codes shares the same essential features of the above 
program. The key difference is that the requirement of an explicit code description in Step 1 
is relaxed. This is because for irregular graphs with specific requirements on degree distribution, 
explicit constructions of large girth graphs seem very hard. Instead, a factor graph chosen randomly 
from a suitable ensemble is used. This raises issues such as the concentration of the performance 
of a random code around the average behavior of the ensemble. It also calls for justification of 
the large girth assumption in the decoding. We will return to these aspects when we begin our 
discussion of irregular LDPC codes in Section |f)J 

We should point out that Gallager himself used random regular LDPC codes for his experiments 
with iterative decoders for various channels such as the BSC, the BIAWGN, and the Rayleigh fading 
channel. However, if we so desire, for the analytic results, even explicit constructions are possible. 
In the rest of this section, we assume an explicit large girth factor graph is used, and focus on the 
analysis of some simple and natural iterative decoders. Thus the only randomness involved is the 
one realizing the channel noise. 



5.2 Decoding on the binary erasure channel 

Although Gallager did not explicitly study the BEC, his methods certainly apply to it, and we 
begin by studying the BEC. For the BEC, there is essentially a unique choice for a non-trivial 
message-passing decoding algorithm. In a variable-to-check message round, a variable whose bit 
value is known (either from the channel output or from a check node in a previous round) passes 
along its value to the neighboring check nodes, and a variable whose bit value is not yet determined 
passes a symbol (say 0) signifying erasure. In the check-to-variable message round, a check node c 
passes to a neighbor v an erasure if it receives an erasure from at least one neighbor besides v, and 
otherwise passes the bit value b to v where b is the parity of the bits received from neighbors other 
than v. Formally, the message maps are given as follows: 



mi, . . . ,m d „-i, 



b if at least one of r, mi, . . . , m^-i equals b £ {1, —1} 
if r = mi = ■ ■ ■ = m,d v -i = 



(Note that the map is well-defined since the inputs to a variable node will never give conflicting 
±1 votes on its value.) 

dc-l 

1=1 

We note that an implementation of the decoder is possible that uses each edge of the factor 
for message passing exactly once. Indeed, once a variable node's value is known, the bit value is 
communicated to its neighboring check nodes, and this node (and edges incident on it) are removed 
from the graph. Each check node maintains the parity of the values received from its neighboring 
variables so far, and updates this after each round of variable messages (note that it receives each 
variable node's value exactly once). When a check node has degree exactly one (i.e., values of all 
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but one of its variable node neighbors are now known), it communicates the parity value it has 
stored to its remaining neighbor, and both the check node and the remaining edge incident on it are 
deleted. This version of the iterative decoder has been dubbed the Peeling Decoder. The running 
time of the Peeling Decoder is essentially the number of edges in the factor graph, and hence it 
performs about d v operations per codeword bit. 

Let us analyze this decoding algorithm for i iterations, where I is a constant (chosen large 
enough to achieve the desired bit-error probability) . We will assume that the factor graph does not 
have any cycle of length at most Ai (which is certainly true if it has f2(logn) girth). 

The following is crucial to our analysis. 

Lemma 1 For each node, the random variables corresponding to the messages received by it in the 
i 'th iteration are all independent, for i < I. 

Let us justify why the above is the case. For this, we crucially use the fact that the message sent 
along an edge, say from v to c, does not depend on the message that v receives from c. Therefore, 
the information received at a check node c (the situation for variable nodes is identical) from its 
neighbors in the i'th iteration is determined by by a computation graph rooted at c, with its d c 
variable node neighbors as its children, the d v — 1 neighbors besides c of each these variable nodes 
as their children, the d c —l other neighbors of these check nodes as their children, and so on. Since 
the girth of the graph is greater than 4£, the computation graph is in fact a tree. Therefore, the 
information received by c from its neighbors in the i'th iteration are all independent. 

Take an arbitrary edge (v, c) between variable node v and check node c. Let us compute the 
probability pi that the message from v to c in the i'th iteration is an erasure (using induction and 
the argument below, one can justify the claim that this probability, which is taken over the channel 
noise, will be independent of the edge and only depend on the iteration number, as long as i < I). 
For i = 0, po = a, the probability that the bit value for v was erased by the BEC a . In the (i + l)'st 
iteration, v passes an erasure to c iff it was originally erased by the channel, and it received an 
erasure from each of its d v — 1 neighbors other than c. Each of these neighboring check nodes d in 
turn sends an erasure to v iff at least one neighbor of d other than v sent an erasure to d during 
iteration i — due to the independence of the involved messages, this event occurs for node d with 
probability (1 — (1 — Pi) c ~ 1 ). Again, because the messages from various check nodes to v in the 
(i + l)'st round are independent, we have 

p i+1 = a-(l-(l- Pi ) d °- 1 ) d "- 1 . (1) 

By linearity of expectation, pi is the expected fraction of variable-to-check messages sent in 
the i'th iteration that are erasures. We would like to show that lim^^p^ = 0, so that the bit- 
error probability of the decoding vanishes as the number of iterations grows. The largest erasure 
probability a for which this happens is given by the following lemma. 

Lemma 2 The threshold erasure probability a MP (d v ,d c ) for the BEC below which the message- 
passing algorithm results in vanishing bit- erasure probability is given by 

a MP (d v ,d c ) = min r-j — r— ; — T . (2) 

v ; *e[o,i] (1 - (1 - x )^-i)^-i w 



Proof. By definition, a MP (d v , d c ) = sup{o G [0, 1] : lim^oo pi = 0} where pi is as defined recursively 
in (JTJ). Define the functions g(x) = ^ 1 _ ( - 1 _ x p c _ 1 ^ ti _ 1 , and f(a,x) = a(l — (1 — x) dc ~ 1 ) dv ~ 1 . Also 

let a* = min^gro^i] g(x). We wish to prove that a MP (d v ,d c ) = a* 
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If a < a*, then for every x £ [0,1], f(a,x) = < < x, and in fact f(a,x) < x for 
x £ (0, 1]. Hence it follows that pi + \ = f(a,pi) < pi and since < f(a, x) < a for all x £ [0, 1], the 
probability converges to a value p^ £ [0, a]. Since / is continuous, we have p^ = f{a,Poo), which 
implies = (since f(a,x) < x for x > 0). This shows that a MP (d v , d c ) > a*. 

Conversely, if a > a*, then let x £ [0,1] be such that a > g(xo). Then a > /(a,xn) = 
> xo, and of course f(a,a) < a. Since f(a,x) is a continuous function of x, we must have 
f(a,x*) = x* for some x* £ (xo,a]. For the recursion (QJ) with a fixed value of a, it is easy to 
see by induction that if po > p' , then pi > p\ for all i > 1. If pg = ^"S then we have p[ = x* for 
all i. Therefore, when po = a > x* , we have > x* for all i as well. In other words, the error 
probability stays bounded below by x* irrespective of the number of iterations. This proves that 
a MP (d v ,d c ) < a*. 

Together, we have exactly determined the threshold to be a* = mhxjgmi] g(x). ■ 



Remark 3 Using standard calculus, we can determine a MP (d v ,d c ) to be ^ 1 _ 7 ^-?^d„-i where 7 is 

the unique positive root of the polynomial p{x) = ((d v — l)(d c — 1) — l)x dc_2 — YliS) 3 x% ■ Note that 
when d v = 2, p{l) = 0, so the threshold equals 0. Thus we must pick d v > 3, and hence d c > 4 (to 
have positive rate). For the choice d v = 3 and d c = 4, p{x) is a quadratic and we can analytically 
compute a MP (3,4) 0.6474; note that capacity for this rate equals 3/4 = 0.75. (The best threshold 
one can hope for equals d v /d c since the rate is at least 1 — d v /d c .) Closed form analytic expressions 
for some other small values of (d v ,d c ) are given in JJfl: for example, a MP (3,5) ~ 0.5406 (compare 
to capacity of 0.6) and a MP (3,6) ~ 0.4294 (compare to capacity of 0.5). 

Theorem 4 For integers 3 < d v < d c , there exists an explicit family of binary linear codes of rate 
at least 1 — ^ that can be reliably decoded in linear time on BEC Q provided a < a MP (d v ,d c ). s 

5.3 Decoding on the BSC 

The relatively clean analysis of regular LDPC codes on the BEC is surely encouraging. As men- 
tioned earlier, Gallager in fact did not consider the BEC in his work. We now discuss one of his 
decoding algorithms for the BSC, that has been dubbed Gallager's Algorithm A, and some simple 
extensions of it. 

5.3.1 Gallager's Algorithm A 

The message alphabet of Algorithm A will equal {1,-1}, so the nodes simply pass guesses on 
codeword bits. The message maps are time invariant and do not depend on the iteration number, 
so we will omit the superscript indicating the iteration number in describing the message maps. 
The check nodes send a message to a variable node indicating the parity of the other neighboring 
variables, or formally: 

d c -l 

*e(mi, • • . ,77trf _l) = Y[ m i 

i=l 

8 Our analysis showed that the bit-error probability can be made below any desired e > by picking the number 
of iterations to be a large enough constant. A more careful analysis using £(n) = Sl(logn) iterations shows that 
bit-error probability is at most exp(— nr) for some constant j3 — (3(d v , d c ). By a union bound, the entire codeword is 
thus correctly recovered with high probability. 
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The variable nodes send to a neighboring check node their original received value unless the incoming 
messages from the other check nodes unanimously indicate otherwise, in which case it sends the 
negative of the received value. Formally, 



#„(r, mi, . . . ,m dv -i) 



if mi = • • 
otherwise 



m d v -i 



As in the case of BEC, we will track the expected fraction of variable-to-check node messages that 
are erroneous in the i'th iteration. Since we assume the all-ones codeword was transmitted, this is 
simply the expected fraction of messages that equal — 1 . Let pi be the probability (over the channel 
noise) that a particular variable-to-check node message in iteration i equals — 1 (as in the case of 
the BEC, this is independent of the actual edge for i < £). Note that we have po = p, the crossover 
probability of the BSC. 

It is a routine calculation using the independence of the incoming messages to prove the following 
recursive equation (El Sec. 4.3], 231 Sec III]: 

f l + (l-2p^-i ^- 1 , ^ 1-(1-2 W )*>-V - 1 

Pi+l=P0-P0\ X +(l-po) o ( 3 ) 



For a fixed value of po, Pi + \ is a increasing function of pi, and for a fixed value of pi, pi + \ is an 
increasing function of p$. Therefore, by induction pi is an increasing function of po- Define the 
threshold value of this algorithm "A" as p A (d v ,d c ) = sup{po G [0,1] : limf^^p^ = 0}. By the 
above argument, if the crossover probability p < p A (d v , d c ), then the expected fraction of erroneous 
messages in the tth iteration approaches as £ — » oo. 

Regardless of the exact quantitative value, we want to point out that when d v > 3, the threshold 
is positive. Indeed, for d v > 2, for small enough p > 0, one can see that < p, for < p, < po 
and Pi+\ = Pi for pi = 0, which means that lim^ooPj = 0. 

Exact analytic expressions for the threshold have been computed for some special cases 
This is based on the characterization of p (d v ,d c ) as the supremum of all po > for which 



x = Po - Po 



1 + (1 - 2x) 



d c -l\ dv- 1 



+ (l-Po) 



1 - (1 - 2x) 



d c -l \ dv- 1 



does not have a strictly positive solution x with x < po- Below are some example values of the 
threshold (up to the stated precision). Note that the rate of the code is 1 — d v /d c and the Shannon 
limit is H~ 1 (d v /d c ) (where H~ l (y) for < y < 1 is defined as the unique value of x £ [0, 1/2] such 
that H(x) = y). 



d v 


d c 


p A (d v ,d c ) 


Capacity 


3 


6 


0.0395 


0.11 


4 


8 


1/21 


0.11 


5 


10 


1/36 


0.11 


4 


6 


1/15 


0.174 


3 


4 


0.106 


0.215 


3 


5 


0.0612 


0.146 



5.3.2 Gallager's Algorithm B 

Gallager proposed an extension to the above algorithm, which is now called Gallager's Algorithm 
B, in which a variable node decides to flip its value in an outgoing message when at least b of the 
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incoming messages suggest that it ought to flip its value. In Algorithm A, we have b = d v — 1. The 
threshold b can also depend on the iteration number, and we will denote by this value during 
the i'th iteration. Formally, the variable message map in the i'th iteration is given by 



mi,.. .,m dv -i) 



—r if \{j : rrij = —r}\ > bi 
r otherwise . 



The check node message maps remain the same. The threshold should be greater than (d v — l)/2 
since intuitively one should flip only when more check nodes suggest a flip than those that suggest 
the received value. So when d v = 3, the above algorithm reduces to Algorithm A. 

Defining the probability of an incorrect variable-to-check node message in the i'th iteration to 
be pi, one can show the recurrence (HJ Sec. 4.3]: 

Pi+i =Po~Po 2^ ( j 



The cut-off value can then be chosen to minimize this value. The solution to this minimization 
is the smallest integer bi + \ for which 



1-po < fl + (l-2p i )^- 1 Y 0i+1 



2b i+1 —d v +l 



Po " \l - (I - 2p i ) d c- 1 

By the above expression, we see that as pi decreases, bi + i never increases. And, as pi is sufficiently 
small, takes the value d v /2 for even d v and (d v + l)/2 for odd d v . Therefore, a variable node 
flips its value when a majority of the d v — 1 incoming messages suggest that the received value was 
an error. We note that this majority criterion for flipping a variable node's bit value was also used 
in decoding of expander codes 2!) . 

Similar to the analysis of Algorithm A, using the above recurrence, one can show that when 
d v > 3, for sufficiently small po > 0, we have Pi+i < p% when < pi < po, and of course when 
Pi = 0, we have Pi+i = 0. Therefore, when d v > 3, for small enough po > 0, we have limj^oopj = 
and thus a positive threshold. 

The values of the threshold of this algorithm for small pairs (d v ,d c ) appear in [23] . For the 
pairs (4,8), (4,6) and (5,10) the thresholds are about 0.051, 0.074, and 0.041 respectively. For 
comparison, for these pairs Algorithm A achieved a threshold of about 0.047, 0.066, and 0.027 
respectively. 



5.3.3 Using Erasures in the Decoder 

In both the above algorithms, each message made up its mind on whether to guess 1 or —1 for a 
bit. But it may be judicious to sometimes abstain from guessing, i.e., to send an "erasure" message 
(with value 0), if there is no good reason to guess one way or the other. For example, this may 
be the appropriate course of action if a variable node receives one-half l's and one-half — l's in the 
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incoming check node messages. This motivates an algorithm with message alphabet {1, 0, —1} and 
the following message maps (in iteration I): 

(dv-l 
w (£) r + ^ rrij 
3=1 

and 

de-l 

*^ ) (mi,m 2 , • • • ,md c -i) = J\ rrij 

i=i 

The weight dictates the relative importance given to the received value compared to the 
suggestions by the check nodes in the tth iteration. These weights add another dimension of 
design choices that one can optimize. 

Exact expressions for the probabilities p\ ^ and p;^ ) that a variable-to-check message is an error 
(equals —1) and an erasure (equals 0) respectively in the i'th iteration can be written down |23| . 
These can be used to pick appropriate weights u;W. For the (3, 6)-regular code, = 2 and 
W W = i for i > 2 is reported as the optimum choice in {2B , and using this choice the resulting 
algorithm has a threshold of about 0.07, which is a good improvement over the 0.04 achieved by 
Algorithm A. More impressively, this is close to the threshold of 0.084 achieves by the "optimal" 
belief propagation decoder. A heuristic to pick the weights u;W is suggested in and the threshold 
of the resulting algorithm is computed for small values of (d v , d c ). 

5.4 Decoding on BIAWGN 

We now briefly turn to the BIAWGN channel. We discussed the most obvious quantization of the 
channel output which converts the channel to a BSC with crossover probability Q(l/a). There is a 
natural way to incorporate erasures into the quantization. We pick a threshold r around zero, and 
quantize the AWGN channel output r into —1, (which corresponds to erasure), or 1 depending on 
whether r < — r, — r < r < r, or r > r, respectively. We can then run exactly the above message- 
passing algorithm (the one using erasures). More generally, we can pick a separate threshold t\ for 
each iteration i — the choice of Tj and w^' can be optimized using some heuristic criteria. Using 
this approach, a threshold of a* = 0.743 is reported for communication using a (3, 6)-regular LDPC 
code on the BIAWGN channel. This corresponds to a raw bit-error probability of Q(l/a*) = 0.089, 
which is almost 2% greater than the threshold crossover probability of about 0.07 achieved on the 
BSC. So even with a ternary message alphabet, providing soft information (instead of quantized 
hard bit decisions) at the input to the decoder can be lead to a good performance gain. The belief 
propagation algorithm we discuss next uses a much large message alphabet and yields further 
substantial improvements for the BIAWGN. 

5.5 The belief propagation decoder 

So far we have discussed decoders with quantized, discrete messages taking on very few values. 
Naturally, we can expect more powerful decoders if more detailed information, such as real values 
quantifying the likelihood of a bit being ±1, are passed in each iteration. We now describe the 
"belief propagation" (BP) decoder which is an instance of such a decoder (using a continuous 
message alphabet). We follow the description in j^Hl Sec. III-B]. In belief propagation, the messages 
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sent along an edge e represent the posterior conditional distribution on the bit associated with the 
variable node incident on e. This distribution corresponds to a pair of nonnegative reals p%,p-i 
satisfying p\ = 1. This pair can be encoded as a single real number (including ±00) using the 

log- likelihood ratio log and the messages used by the BP decoder will follow this representation. 

Each node acts under the assumption that each message communicated to it in a given round 
is a conditional distribution on the associated bit, and further each such message is conditionally 
independent of the others. Upon receiving the messages, a node transmits to each neighbor the 
conditional distribution of the bit conditioned on all information except the information from that 
neighbor (i.e., only extrinsic information is used in computing a message). If the graph has large 
enough girth compared to the number of iterations, this assumption is indeed met, and the mes- 
sages at each iteration reflect the true log-likelihood ratio given the observed values in the tree 
neighborhood of appropriate depth. 

If Zi , Z2 , • • • ,h are the likelihood ratios of the conditional distribution of a bit conditioned on 
independent random variables, then the likelihood ratio of the bit value conditioned on all of the 
random variables equals Y\i=i h- Therefore, log-likelihoods of independent messages add up, and 
this leads to the variable message map (which is independent of the iteration number): 



^(m ,m 1 ,...,m^ 1 )=E 



1 




where tuq is the log-likelihood ratio of the bit based on the received value (eg., for the BSC P , 
mo = t log where r 6 {1,-1} is the received value). 

The performance of the decoder is analyzed by tracking the evolution of the probability density 
of the log-likelihood ratios (hence the name "density evolution" for this style of analysis). By the 
above, given densities Pq, Pi, . . . , -P^-i on the real quantities mo, mi, . . . , m^-i, the density of 
$„("iOi m ii ■ ■ ■ , m d v -i) is the convolution Pq (g) P\ (g) • • • <g) Pd v -i over the reals of those densities. 
In the computation, one has Pi = P% = ■ ■ ■ = Pd v -i and the densities will be quantized, and the 
convolution can be efficiently computed using the FFT. 

Let us now turn to the situation for check nodes. Given bits 6j, 1 < i < k, with independent 
probability distributions (Pi,p!_i), what is the distribution (pi,£>-i) of the bit b = Yli=i^ We 
have the expectation 

E[b] = E[H h) = [J E[bt] = H(p\ - pU) ■ 

i i i 

Therefore we have p\ — p~\ = Y[i=i(p\ ~ P—i)- Now if m is the log- likelihood ratio log then 
Pi — p-i = e ™LzX = tanh(m/2). Conversely, if p\ — p-i = q, then log ^1- = logi±|. These 
calculations lead to the following check node map for the log-likelihood ratio: 



* c (mi, m 2 , • • • , m dc _i) = log 



i +nti ltanh K/2) 

1 - lit! 1 tanh(m;/2) 



It seems complicated to track the density of $ c (mi, mi, . . . , m^-i) based on those of the mi's. 
However, as shown in 23], this can be also be realized via a Fourier transform, albeit with a slight 
change in representation of the conditional probabilities {p\,p-i). We skip the details and instead 
point the reader to |2H1 Sec. III-B]. 

Using these ideas, we have an effective algorithm to recursively compute, to any desired degree 
of accuracy, the probability density of the log-likelihood ratio of the variable-to-check node 
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messages in the l-th iteration, starting with an explicit description of the initial density P(°\ 
The initial density is simply the density of the log-likelihood ratio of the received value, assuming 
transmission of the all-ones codeword; for example, for BSC P , the initial density P(°) is given by 

p(°) (x) =p5 ( x- log — ^— ) + (1 - p)S ( x - log — 

V i-py V P 

where 5(x) is the Dirac delta function. 

The threshold crossover probability for the BSC and the threshold variance for the BIAWGN 
under belief propagation decoding for various small values of (d v , d c ) are computed by this method 
and reported in |23] . For the (3,6) LDPC code, these thresholds are respectively p* = 0.084 
(compare with Shannon limit of 0.11) and a* = 0.88 (compare with Shannon limit of 0.9787). 

The above numerical procedure for tracking the evolution of densities for belief propagation 
and computing the associated threshold to any desired degree of accuracy has since been applied 
with great success. In [22], the authors apply this method to irregular LDPC codes with optimized 
structure and achieve a threshold of a* = 0.9718 with rate 1/2 for the BIAWGN, which is a mere 
0.06 dB way from the Shannon capacity limit. 9 

6 Irregular LDPC codes 

Interest in LDPC codes surged following the seminal paper ^5] that initiated the study of irregular 
LDPC codes, and proved their potential by achieving the capacity on the BEC. Soon, it was realized 
that the benefits of irregular LDPC codes extend to more powerful channels, and this led to a flurry 
of activity. In this section, we describe some of the key elements of the analytic approach used to 
to study message-passing decoding algorithms for irregular LDPC codes. 

6.1 Intuitive benefits of irregularity 

We begin with some intuition on why one might expect improved performance by using irregular 
graphs. In terms of iterative decoding, from the variable node perspective, it seems better to 
have high degree, since the more information it gets from check nodes, the more accurately it 
can guess its correct value. On the other hand, from the check node perspective, the lower its 
degree, the more valuable the information it can transmit back to its neighbors. (The XOR of 
several mildly unpredictable bits has a much larger unpredictability.) But in order to have good 
rate, there should be far fewer check nodes than variable nodes, and therefore meeting the above 
competing requirements is challenging. Irregular graphs provide significantly more flexibility in 
balancing the above incompatible degree requirements. It seems reasonable to believe that a wide 
spread of degrees for variable nodes could be useful. This is because one might expect that variable 
nodes with high degree will converge to their correct value quickly. They can then provide good 
information to the neighboring check nodes, which in turn provide better information to lower 
degree variable nodes, and so on leading to a cascaded wave effect. 

The big challenge is to leap from this intuition to the design of appropriate irregular graphs 
where this phenomenon provably occurs, and to provide analytic bounds on the performance of 
natural iterative decoders on such irregular graphs. 

9 The threshold signal-to-noise ratio l/(cr*) 2 = 0.2487 dB, and the Shannon limit for rate 1/2 is 0.187 dB. 




18 



Compared to the regular case, there are additional technical issues revolving around how irreg- 
ular graphs are parameterized, how they are constructed (sampled), and how one deals with the 
lack of explicit large-girth constructions. We discuss these issues in the next two subsections. 



6.2 The underlying ensembles 

We now describe how irregular LDPC codes can be parameterized and constructed (or rather 
sampled). Assume we have an LDPC code with n variable nodes with Aj variable nodes of degree 
i and Pj check nodes of degree i. We have Aj = n, and Yli *Aj = iPi as both these equal the 
number of edges in the graph. Also J2i P = n (l ~~ r ) where r is the designed rate of the code. It 
is convenient to capture this information in the compact polynomial notation: 

A(x) = ]T Ax 1 , P(x) = p i xi ■ 

i=2 i=l 

We call the polynomials A and P the variable and check degree distributions from a node per- 
spective. Note that A(l) is the number of variable nodes, P(l) the number of check nodes, and 
A'(l) = P'(l) the number of edges. 

Given such a degree distribution pair (A, P), let LDPC(A, P) denote the "standard" ensemble 
of bipartite (multi)graphs with A(l) variable nodes and P(l) check nodes, with Aj variable nodes 
and Pi check nodes of degree i. This ensemble is defined by taking A'(l) = -P'(l) "sockets" on 
each side, allocating i sockets to a node of degree i in some arbitrary manner, and then picking a 
random matching between the sockets. 

To each member of LDPC(A, P), we associate the code of which it is the factor graph. A 
slight technicality: since we are dealing with multigraphs, in the parity check matrix, we place a 
non-zero entry at row i and column j iff the ith check node is connected to the jth variable node 
an odd number of times. Therefore, we can think of the above as an ensemble of codes, and by 
abuse of notation also refer to it as LDPC(A, P). (Note that the graphs have a uniform probability 
distribution, but the induced codes need not.) In the sequel, our LDPC codes will be obtained by 
drawing a random element from the ensemble LDPC(A, P). 

To construct a family of codes, one can imagine using a normalized degree distribution giving 
the fraction of nodes of a certain degree, and then considering an increasing number of nodes. For 
purposes of analysis, it ends up being convenient to use normalized degree distributions from the 
edge perspective. Let Aj and pi denote the fraction of edges incident to variable nodes and check 
nodes of degree i respectively. That is, Aj (resp. pi) is the probability that a randomly chosen edge 
is connected to a variable (resp. check) node of degree i. These distributions can be compactly 
written in terms of the power series defined below: 



A(x) = ^A^" 1 , p(x) = J2 



PiX 



It is easily seen that \(x) = jM\ and p{x) = pTTjy- If M is the total number of edges, then 
the number of variable nodes of degree i equals MAj/z, and thus the total number of variable 
nodes is M J^i It follows that that the average variable node degree equals y\.u = r i J ■,, ■ 

Likewise, the average check node degree equals fl ) — . It follows that the designed rate can be 

Jo p(z)dz 
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expressed in terms of A, p as 



r = r(X,p) 



1 - 



Jo P( z )dz 
Jo K z )d z 



(4) 



We also have the inverse relationships 



_ Jo K*)dz 



p (x) _ Jo P( z ) dz 



(5) 



n jj \{ z )dz 



Therefore, (A, P) and (n, A, p) carry the same information (in the sense we can obtain each from 
the other). For the asymptotic analysis we use (n, A, p) to refer to the LDPC code ensemble. There 
is a slight technicality that for some n, the (A, P) corresponding to (n, A, p) may not be integral. In 
this case, rounding the individual node distributions to the closest integer has negligible effect on 
the asymptotic performance of decoder or the rate, and so this annoyance may be safely ignored. 

The degree distributions A, p play a prominent role in the line of work, and the performance of 
the decoder is analyzed and quantified in terms of these. 

6.3 Concentration around average performance 

Given a degree distribution pair (A, p) and a block length n, the goal is to mimic Gallager's program 
(outlined in Section lo"T|) . using a factor graph with degree distribution (A,p) in place of a (d v ,d c )- 
regular factor graph. However, the task of constructing explicit large girth graphs obeying precise 
irregular degree distributions seems extremely difficult. Therefore, a key difference is to give up 
on explicitness, and rather sample an element from the ensemble LDPC(n, A, p), which can be done 
easily as mentioned above. 

It is not very difficult to show that a random code drawn from the ensemble will have the needed 
girth (and thus be tree-like in a local neighborhood of every edge/vertex) with high probability; 
see for instance [231 Appendix A]. A more delicate issue is the following: For the irregular case 
the neighborhood trees out of different nodes have a variety of different possible structures, and 
thus analyzing the behavior of the decoder on a specific factor graph (after it has been sampled, 
even conditioning on it having large girth) seems hopeless. What is feasible, however, is to analyze 
the average behavior of the decoder (such as the expected fraction, say P^ (£), of erroneous 
variable-to-check messages in the tth iteration) taken over all instances of the code drawn from the 
ensemble LDPC(n, A, p) and the realization of the channel noise. It can be shown that, as n — » do, 
Pri"' P \t) converges to a certain quantity P^' p \i), which is defined as the probability (taken over 
both choice of the graph and the noise) that an incorrect message is sent in the tih iteration along 
an edge (v, c) assuming that the depth 2£ neighborhood out of v is a tree. 

In order to define the probability P^' p \i) more precisely, one uses a "tree ensemble" Ti(\,p) 
defined inductively as follows. To (A, p) consists of the trivial tree consisting of just a root variable 
node. For I > 1, to sample from Ti(X,p), first sample an element from l}_i(X, p). Next for each 
variable leaf node (independently), with probability Aj + i attach i check node children. Finally, for 
each of the new check leaf nodes, independently attach i variable node children with probability 
Pi+i- The quantity P^' p \l) is then formally defined as the probability that the outgoing message 
from the root node of a sample T from 7^(A, p) is incorrect, assuming the variable nodes are initially 
labeled with 1 and then the channel noise acts on them independently (the probability is thus both 
over the channel noise and the choice of the sample T from 7^(A,p)). 
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The convergence of Pn ,p {£) to Pj- ' p {£) is a simple consequence of the fact that, for a random 
choice of the factor graph from LDPC(n, A, p), the depth 2£ neighborhood of an edge is tree-like 
with probability tending to 1 as n gets larger (for more details, see [2S3, Thm. 2]). 

The quantity P^' p) (£) for the case of trees is easily computed, similar to the case of regular 
graphs, by a recursive procedure. One can then determine the threshold channel parameter for 
which P^" p) {£) -» as £ -> oo. 

However, this only analyzed the average behavior of the ensemble of codes. What we would like 
is for a random code drawn from the ensemble LDPC(n,A, p) to concentrate around the average 
behavior with high probability. This would mean that almost all codes behave alike and thus the 
individual behavior of almost all codes is characterized by the average behavior of the ensemble 
(which can be computed as outlined above). A major success of this theory is that such a concen- 
tration phenomenon indeed holds, as shown in and later extended to a large class of channels 
in [23]. The proof uses martingale arguments where the edges of the factor graph and then the 
inputs to the decoder are revealed one by one. We refrain from presenting the details here and 
point the reader to |171 Thm. 1] and |23l Thm. 2] (the result is proved for regular ensembles in 
these works but extends to irregular ensembles as long as the degrees in the graph are bounded). 

In summary, it suffices to analyze and bound P^' p \£), and if this tends to as £ — > oo, then in 
the limit of a large number of decoding iterations, for almost all codes in the ensemble, the actual 
bit error probability of the decoder tends to zero for large enough block lengths. 

Order of limits: A remark on the order of the limits might be in order. The proposed style of anal- 
ysis aims to determine the threshold channel parameter for which lim^oo lim n _ >00 E[Pn (£)] = 0. 
That is, we first fix the number of iterations and determine the limiting performance of an ensemble 
as the block length tends to infinity, and then let the number of iterations tend to infinity. Ex- 
changing the order of limits gives us the quantity lim^oo linin-^oo E[P^ ,P ^ {£)]■ It is this limit that 
corresponds to the more typical scenario in practice where for each fixed block length, we let the 
iterative decoder run until no further progress is achieved. We are then interested in the limiting 
performance as the block length tends to infinity. For the BEC, it has been shown that for both the 
orders of taking limits, we get the same threshold [25{ Sec. 2.9.8]. Based on empirical observations, 
the same has been conjectured for channels such as the BSC, but a proof of this seems to be out 
of sight. 



6.4 Analysis of average performance for the BEC 

We now turn to analyzing the average behavior of the ensemble LDPC(n, A, p) under message-passing 
decoding on the BEC. (The algorithm for regular codes from Section f5 . 21 extends to irregular codes 
in the obvious fashion — the message maps are the same except the maps at different nodes will 
have different number of arguments.) 

Lemma 5 (Performance of tree ensemble channel on BEC) Consider a degree distribution 
pair (A, p) and a real number < a < 1. Define xq = a and for £ > 1, 

x t = aA(l - p(l - xg-i)) . (6) 

Then, for the BEC with erasure probability a, for every £ > 1, we have P^' p \i) = xp. 

Proof. The proof follows along the lines of the recursion <d} that we established for the regular case. 
The case £ = is clear since the initial variable-to-check message equals the received value which 
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equals an erasure with probability a. Assume that for < i < £, P^' p \i) = x\. In the £'th iteration, 
a check-to- variable node message sent by a degree i check node is the erasure message if any of the 
(i — 1) incoming messages is an erasure, an event that occurs with probability 1 — (1 — 
(since the incoming messages are independent and each is an erasure with probability by 
induction). Since the edge has probability pi to be connected to a check node of degree i, the 
erasure probability of a check-to-variable message in the £'th iteration for a randomly chosen edge 
is equal to ^ pi(l — (1 — x^_i) % ~ 1 ) = 1 — p(l — a^_i). Now consider a variable-to-check message 
in the ^'th iteration sent by a variable node of degree i. This is an erasure iff the node was 
originally erased and each of the (i — 1) incoming messages are erasures. Thus it is an erasure 
with probability a(l — p(l — xg—i)) % . Averaging over the edge degree distribution A(-), we have 
P^' p \£) = a\(l-p(l-x^ 1 )) = x e . m 

The following lemma yields the threshold erasure probability for a given degree distribution 
pair (A,p). The proof is identical to Lemma 121 — we simply use the recursion (jUJ) in place of 
Note that LemmaHJis a special case when X(z) = z dv_1 and p(z) = z c ~ 1 . 

Lemma 6 For the BEC, the threshold erasure probability a MP (A, p) below which the above iterative 
message passing algorithm leads to vanishing bit-erasure probability as the number of iterations 
grows is given by 

a MP (X,p) = min — % - . (7) 

xe[o,i] A(l -p(l - x)) 

6.5 Capacity achieving distributions for the BEC 

Having analyzed the performance possible on the BEC for a given degree distribution pair (A,p), 
we now turn to the question of what pairs (X,p), if any, have a threshold approaching capacity. 

rl (z)d 

Recalling the designed rate from @, the goal is to find (X,p) for which a MP (X,p) ~ " . 

Jo M 2 )" 2 

We now discuss a recipe for constructing such degree distributions, as discussed in 20 and 
[251 Sec. 2.9.11] (we follow the latter description closely). In the following we use parameters 
6 > and a positive integer N that will be fixed later. Let T> be the space of non-zero functions 
h : [0, 1) — > 1R + which are analytic around zero with a Taylor series expansion comprising of 
non-negative coefficients. Pick functions Xg(x) £ V and pe(x) S V that satisfy pe(l) = 1 and 

X e (l - p e (l - x)) = x , Vxg[0,1). (8) 

Here are two example choices of such functions: 

1. Heavy- Tail Poisson Distribution [15] , dubbed "Tornado sequence" in the literature. Here we 
take 

oo 



V X ) = Q = ^E-' aIld 

1=1 



v. 

i=0 
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2. Check-concentrated degree distribution j^H]. Here for 9 £ (0, 1) so that 1/0 is an integer, we 
take 

\ e (x) = l _ (l _ x f = jr ( 9 \-l) l - 1 x i , and 

i=i 



Let Ag (x) be the function consisting of the first N terms (up to the x term) of the 

(N) \^ N \x) 

Taylor series expansion of Xg(x) around zero, and define the normalized function X s (x) = — 

(for large enough N, X@ (1) > 0, and so this polynomial has positive coefficients). For suitable 
parameters N, 6, the pair (a£ , p#) will be our candidate degree distribution pair. 10 The non- 
negativity of the Taylor series coefficients of Xg(x) implies that for x £ [0,1], Xg(x) > Xg N \x), 
which together with (jHJ) gives 

x = X e (l - Pe (l - x)) > Af'(l - Pe (l - x)) = xf\l)xf\l - Pe (l - x)) . 

By the characterization of the threshold in LemmaEl it follows that a MP (Xg N \ pg) > Al (1). Note 
that the designed rate equals 

Therefore, given a target erasure probability a, to communicate at rates close to capacity 1 — a, 
the functions A^ and pg must satisfy 



Xf } (l)^a and -> 1 as iV -> oo . (9) 



For example, for the Tornado sequence, A^ (1) = ^^j 1 | = H ^ 1 - ) where H(m) is the 
Harmonic function. Hence, picking 6 = — ensures that the threshold is at least a. We 
have ! 1 Q Xf\z)dz = \ X)*" 1 = and £ Pe (z)dz = 3=ft. Therefore, = 

(1 — e _H ( Ar_1 ^ a )(l — 1/N) — > 1 as N — > oo, as desired. Thus the degree distribution pair is 
explicitly given by 

Note that picking iV ~ 1/e yields a rate (1 — e)a for reliable communication on BEC a . The 
average variable node degree equals f i , rivu ~ H(iV — 1) IniV. Therefore, we conclude 

Jo 

10 If the power series expansion of pe{x) is infinite, one can truncate it at a sufficiently high term and claimed bound 
on threshold still applies. Of course for the check-concentrated distribution, this is not an issue! 
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that we achieve a rate within a multiplicative factor (1 — e) of capacity with decoding complexity 
0(nlog(l/e)). 

For the check-concentrated distribution, if we want to achieve a MP (X^\pe) > a and a rate 
r > (1 — e)a, then it turns out that the choice N 1/e and 1/0 = [" - in"i^a) l wor ks. In particular, 
this means that the factor graph has at most 0(nlog(l/e)) edges, and hence the "Peeling decoder" 
will again run in 0(nlog(l/e)) time. 

One might wonder that among the various capacity achieving degree distributions that might 
exist for the BEC, which one is the "best" choice? It turns out that in order to achieve a fraction 
(1 — e) of capacity, the average degree of the factor graph has to be fi(ln(l/e)). This is shown 
in using a variant of Gallager's argument for lower bounding the gap to capacity of LDPC 
codes. In fact, rather precise lower bounds on the sparsity of the factor graph are known, and the 
check-concentrated distribution is optimal in the sense that it matches these bounds very closely; 
see (m for the detailed calculations. 

In light of the above, it might seem that check-concentrated distributions are the final word in 
terms of the performance-complexity trade-off. While this is true in this framework of decoding 
LDPC codes, it turns out by using more complicated graph based codes, called Irregular Repeat- 
Accumulate Codes, even better trade-offs are possible (25 ■ We will briefly return to this aspect in 
Section [7| 



6.6 Extensions to channels with errors 

Spurred by the remarkable success of JB] in achieving capacity of the BEC, Luby et al ^Zj inves- 
tigated the performance of irregular LDPC codes for the BSC. 

In particular, they considered the natural extension of Gallager's Algorithm B to irregular 
graphs, where in iteration i, a variable node of degree j uses a threshold bij for flipping its value. 
Applying essentially the same arguments as in Section 15.3.21 but accounting for the degree distri- 
butions, one gets the following recurrence for the expected fraction p^ of incorrect variable-to-check 
messages in the tth iteration: 



Pi+i = Po - Po 2^ (t)( 2 J I 2 J 

3=1 t=b i+1J V 7 V 7 



As with the regular case, the cut-off value fri+ij can then be chosen to minimize the value of Pi+i, 
which is given by the smallest integer for which 

1-Po < {l + P (l-2 Pi )\ 2b ^-i +1 



po \l-p(l-2pi 

Note that 26j + ij — j + 1 = — (j — 1 — equals the difference between the number of check 

nodes that agree in the majority and the number that agree in the minority. Therefore, a variable 
node's decision in each iteration depends on whether this difference is above a certain threshold, 
regardless of its degree. 
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Based on this, the authors of ^7] develop a linear programming approach to find a good A 
given a distribution p, and use this to construct some good degree distributions. Then using 
the above recurrence they estimate the theoretically achievable threshold crossover probability. 
Following the development of the density evolution algorithm to track the performance of belief 
propagation decoding [23j, the authors of 22] used optimization techniques to find good irregular 
degree distributions for belief propagation decoding. The BIAWGN channel was the primary focus 
in , but the authors also list a few examples that demonstrate the promise of the techniques 
for other channels. In particular, for the BSC with rate 1/2, they report a degree distribution pair 
with maximum variable node degree 75 and check-node distribution p(x) = 0.25x 9 + 0.75x 10 for 
which the computed threshold is 0.106, which is quite close to the Shannon capacity limit 0.11. 
The techniques were further refined and codes with rate 1/2 and a threshold of a* ~ 0.9781 (whose 
SNR is within 0.0045 dB of capacity) were reported for the BIAWGN in [3] — these codes use only 
two different check node degrees j,j + 1 for some integer j > 2. 

7 Linear encoding time and Repeat- Accumulate Codes 

The linear decoding complexity of LDPC codes is one of their attractive features. Being linear 
codes, they generically admit quadratic time encoding. In this section, we briefly discuss how the 
encoding complexity can be improved, and give pointers to where results in this vein can be found 
in more detail. 

The original Tornado codes paper jlfij achieved linear time encoding using a cascade of several 
low-density generator matrix (LDGM) codes. In LDGM codes, the "factor" graph is actually used 
to compute actual check bits from the k message bits (instead of specifying parity checks that the 
codeword bits must obey). Due to the sparse nature of the graph, the check bits can be computed 
in linear time. These check bits are then used as message bits for the next layer, and so on, till the 
number of check bits becomes 0(yfk). These final set of check bits are encoded using a quadratic 
time encodable linear code. 

We now mention an alternate approach to achieve linear time encoding for LDPC codes them- 
selves (and not a cascaded variant as in ^HJ), based on finding a sparse parity check matrix with 
additional nice properties. Let H £ ]p™ xn ^he parity check matrix of an LDPC code of dimen- 
sion n — m. By means of row and column operations, we can convert H into a form H where 
the last m columns are linearly independent, and moreover the m x m sub matrix consisting of 
the last m columns is lower triangular (with l's on the diagonal). Using H, it is a simple matter 
of "back-substitution" to compute the m parity bits corresponding to the n — m information bits 
(the encoding is systematic). The complexity of this encoding is governed by the number of l's 
in H. In general, however, when we begin with a sparse H, the resulting matrix H is no longer 
sparse. In a beautiful paper )24| . Richardson and Urbanke propose finding an "approximate" lower 
triangulation of the parity check matrix that is still sparse. The idea is to make the top right 
(m — g) x (m — g) corner of the matrix lower triangular for some small "gap" parameter g. The 
encoding can be done in 0(n + g 2 ) time, which is linear if g = 0{y/n). Remarkably, for several 
distribution pairs (A, p), including all the optimized ones listed in [22], it is shown in |24j that, with 
high probability over the choice of the code from the ensemble LDPC(n, A, p), a gap of 0{y/n) can 
in fact be achieved, thus leading to linear encoding complexity! 

Yet another approach to achieve linear encoding complexity that we would like to focus on (as it 
has some additional applications), is to use Irregular Repeat- Accumulate (IRA) codes. IRA codes 
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were introduced by Jin, Khandekar and McEliece in ^5], by generalizing the notion of Repeat- 
Accumulate codes from [I] in conjunction with ideas from the study of irregular LDPC codes. 

IRA codes are defined as follows. Let (A, p) be a degree distribution pair. Pick a random 
bipartite graph G with k information nodes on left (with a fraction Aj of the edges being incident 
on information nodes of degree i), and n > k check nodes on the right (with a fraction pi of the 
edges incident being incident on check nodes of degree i). Actually, it turns out that one can pick 
the graph to be regular on the check node side and still achieve capacity, so we can even restrict 
ourselves to check-degree distributions given by p a = 1 for some integer a. Using G, the encoding 
of the IRA code (of dimension k and block length n) proceeds as follows: 

• Place the k message bits on the k information nodes. 

• For 1 < i < n, at the i'th check node, compute the bit V{ E {1, —1} which equals the parity 
(i.e., product, in ±1 notation) of the message bits placed on its neighbors. 

• (Accumulation step) Output the codeword (wi, W2, ■ ■ ■ , w n ) where Wj = Yli=i v i- ^ n other 
words, we accumulate the parities of the prefixes of the bit sequence (vi, V2, ■ ■ ■ , v n ). 

Note that the encoding takes O(n) time. Each of the check nodes has constant degree, and 
thus the fj's can be computed in linear time. The accumulation step can then be performed using 
additional O(n) operations. 

It is not hard to show that the rate of the IRA code corresponding to a pair (A, p) as defined 
above equals x ( z ) dz ^ 

Jo P\z)dz 

A natural iterative decoding algorithm for IRA codes is presented and analyzed in |3] (a de- 
scription also appears in 21 ). The iterative algorithm uses a graphical model for message passing 
that includes the above bipartite graph G connecting information nodes to check nodes, juxtaposed 
with another bipartite graph connecting the check nodes to n code nodes labeled x\, X2, ■ ■ ■ , x n . In 
this graph, which is intended to reflect the accumulation process, code node Xi for 1 < i < n is 
connected to the i'th and (i + l)'th check nodes (ones where Vi,Vi + \ are computed), and node x n 
is connected to the check node where v n is computed. 

It is proved (see Sec. 2]) that for the above non- systematic IRA codes, the iterative decoding 
on BEC Q converges to vanishing bit-erasure probability as the block length n — > oo, provided 



A 1 



1 i 2 

1 — a 



1 - <xR(l - x) 



p{\ - x) < x Vx G (0, 1] . (10) 



In the above R(x) = Y^Li Ri xt is the power series whose coefficient Ri equals the fraction of check 

nodes that are connected to i information nodes in G. Recalling ©, we have R(x) = 5', , — . 

Jo P( z ) dz 

Using the above characterization, degree distribution pairs (A, p) for IRA codes that achieve 
the capacity of the BEC have been found in 0JI2Z]. 11 In particular, we want to draw attention 
to the construction in |2j| with p(x) = x 2 that can achieve a rate of (1 — e)(l — a), i.e., within a 



11 Actually, these papers work with a systematic version of IRA where the codeword includes the message bits in 

addition to the accumulated check bits xi, . . . ,x n . Such systematic codes have rate equal to (l + & tttt ) i an d 

V Jo 

the decoding success condition il l (It for them is slightly different, with a factor a multiplying the A(-) term on the 
left hand side. 
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(1 — e) multiplicative factor of the capacity of the BEC, for a G [0,0. 95]. 12 Since p{x) = x 2 , all 
check nodes are connected to exactly 3 information nodes. Together with the two code nodes they 
are connected to, each check node has degree 5 in the graphical model used for iterative decoding. 
The total number of edges in graphical model is thus 5n, and this means that the complexity of 
the encoder as well as the "Peeling" implementation of the decoder is at most 5n. In other words, 
the complexity per codeword bit of encoding and decoding is bounded by an absolute constant, 
independent of the gap e to capacity. 

8 Summary 

We have seen that LDPC codes together with natural message-passing algorithms constitute a pow- 
erful approach for the channel coding problem and to approach the capacity of a variety of channels. 
For the particularly simple binary erasure channel, irregular LDPC codes with carefully tailored 
degree distributions can be used to communicate at rates arbitrarily close to Shannon capacity. 
Despite the impressive strides in the asymptotic analysis of iterative decoding of irregular LDPC 
codes, for all nontrivial channels except for the BEC, it is still unknown if there exist sequences 
of degree distributions that can get arbitrarily close to the Shannon limit. By optimizing degree 
distributions numerically and then computing their threshold (either using explicit recurrences or 
using the density evolution algorithm), various rather excellent bounds on thresholds are known 
for the BSC and BIAWGN. These, however, still do not come close to answering the big theoretical 
open question on whether there are capacity-achieving ensembles of irregular LDPC codes (say for 
the BSC), nor do they provide much insight into their structure. 

For irregular LDPC codes, we have explicit sequences of ensembles of codes that achieve the 
capacity of the BEC (and come pretty close for the BSC and the BIAWGN channel). The codes 
themselves are not fully explicit, but rather sampled from the ensemble. While the concentration 
bounds guarantee that almost all codes from the ensemble are likely to be good, it may still be 
nice to have an explicit family of codes (rather than ensembles) with these properties. Even for 
achieving capacity of the BEC, the only known "explicit" codes require a brute- force search for 
a rather large constant sized code, and the dependence of the decoding complexity on the gap e 
to capacity is not as good as for irregular LDPC ensembles. For the case of errors, achieving a 
polynomial dependence on the gap e to capacity remains an important challenge. 
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